alternative hypothesis
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.63)
- North America > United States > Texas (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > North Carolina (0.04)
- (2 more...)
Unified Unbiased Variance Estimation for MMD: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses
Zhong, Shijie, Fu, Jiangfeng, Yang, Yikun
The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from $\mathcal O(n^2)$ to $\mathcal O(n \log n)$.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > North Carolina (0.04)
- Asia > Middle East > Jordan (0.04)
K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing
Conditional independence (CI) testing is a fundamental task in statistics and machine learning, but its effectiveness is hindered by the challenges posed by high-dimensional conditioning variables and limited data samples. This article introduces a novel testing approach to address these challenges and enhance control of the type I error while achieving high power under alternative hypotheses. The proposed approach incorporates a computationally efficient classifier-based conditional mutual information (CMI) estimator, capable of capturing intricate dependence structures among variables. To approximate a distribution encoding the null hypothesis, a $k$-nearest-neighbor local sampling strategy is employed. An important advantage of this approach is its ability to operate without assumptions about distribution forms or feature dependencies. Furthermore, it eliminates the need to derive asymptotic null distributions for the estimated CMI and avoids dataset splitting, making it particularly suitable for small datasets.
Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models
Several out-of-distribution (OOD) detection scores have been recently proposed for deep generative models because the direct use of the likelihood threshold for OOD detection has been shown to be problematic. In this paper, we propose a new OOD score based on a Bayesian hypothesis test called the locally most powerful Bayesian test (LMPBT). The LMPBT is locally most powerful in that the alternative hypothesis (the representative parameter for the OOD sample) is specified to maximize the probability that the Bayes factor exceeds the evidence threshold in favor of the alternative hypothesis provided that the parameter specified under the alternative hypothesis is in the neighborhood of the parameter specified under the null hypothesis. That is, under this neighborhood parameter condition, the test with the proposed alternative hypothesis maximizes the probability of correct detection of OOD samples. We also propose numerical strategies for more efficient and reliable computation of the LMPBT for practical application to deep generative models. Evaluations conducted of the OOD detection performance of the LMPBT on various benchmark datasets demonstrate its superior performance over existing OOD detection methods.
Machine-Learning-Assisted Comparison of Regression Functions
Yan, Jian, Li, Zhuoxi, Ning, Yang, Chen, Yong
We revisit the classical problem of comparing regression functions, a fundamental question in statistical inference with broad relevance to modern applications such as data integration, transfer learning, and causal inference. Existing approaches typically rely on smoothing techniques and are thus hindered by the curse of dimensionality. We propose a generalized notion of kernel-based conditional mean dependence that provides a new characterization of the null hypothesis of equal regression functions. Building on this reformulation, we develop two novel tests that leverage modern machine learning methods for flexible estimation. We establish the asymptotic properties of the test statistics, which hold under both fixed- and high-dimensional regimes. Unlike existing methods that often require restrictive distributional assumptions, our framework only imposes mild moment conditions. The efficacy of the proposed tests is demonstrated through extensive numerical studies.
- North America > United States > Pennsylvania (0.04)
- Asia > China > Hong Kong (0.04)
Anchor-based Maximum Discrepancy for Relative Similarity Testing
Zhou, Zhijian, Peng, Liuhua, Tian, Xunye, Liu, Feng
The relative similarity testing aims to determine which of the distributions, P or Q, is closer to an anchor distribution U. Existing kernel-based approaches often test the relative similarity with a fixed kernel in a manually specified alternative hypothesis, e.g., Q is closer to U than P. Although kernel selection is known to be important to kernel-based testing methods, the manually specified hypothesis poses a significant challenge for kernel selection in relative similarity testing: Once the hypothesis is specified first, we can always find a kernel such that the hypothesis is rejected. This challenge makes relative similarity testing ill-defined when we want to select a good kernel after the hypothesis is specified. In this paper, we cope with this challenge via learning a proper hypothesis and a kernel simultaneously, instead of learning a kernel after manually specifying the hypothesis. We propose an anchor-based maximum discrepancy (AMD), which defines the relative similarity as the maximum discrepancy between the distances of (U, P) and (U, Q) in a space of deep kernels. Based on AMD, our testing incorporates two phases. In Phase I, we estimate the AMD over the deep kernel space and infer the potential hypothesis. In Phase II, we assess the statistical significance of the potential hypothesis, where we propose a unified testing framework to derive thresholds for tests over different possible hypotheses from Phase I. Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets. Codes are publicly available at: https://github.com/zhijianzhouml/AMD.
- North America > United States > Hawaii (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > Canada (0.04)
- (9 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Health & Medicine (0.67)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Data Science (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.45)
- North America > United States > Texas (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > North Carolina (0.04)
- (3 more...)